Document upcoming MXNet training script format #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

laurenyu merged 20 commits into aws:master from laurenyu:mxnet-script-mode-readme-warning

Sep 18, 2018

Contributor

laurenyu commented Sep 17, 2018

Description of changes:
This change adds a warning to the README about the upcoming MXNet training script format, in addition to basic instructions of what kind of changes will be needed once the format is changed.

I didn't use a ReST admonition for the warning because GitHub won't render them.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have updated the changelog with a description of my changes (if appropriate)
I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

laurenyu added 18 commits

September 12, 2018 16:18


          Start adding documentation about upcoming MXNet training script changes

c2dc20b


          attempt to fix formatting

c52ad94


          attempt to fix formatting

83245c5


          attempt to fix formatting


          attempt to fix formatting

ffe9bfe


          attempt to fix formatting

891bf59


          Revert "attempt to fix formatting"

66b9572

This reverts commit 891bf59.


          fix link

11dccfd


          begin writing instructions

5f4a303


          attempt to fix formatting

518760b


          attempt to fix formatting

614cdcc


          attempt to fix formatting

a16ad86


          attempt to fix formatting

c71d44f


          Continue writing

e2eb318


          attempt to fix formatting

d790884


          Continue writing

bb0b3f4


          tweak wording

0e6a49a


          Merge branch 'master' into mxnet-script-mode-readme-warning

9f97499

laurenyu requested a review from icywang86rui

September 17, 2018 16:00

icywang86rui reviewed

View reviewed changes

src/sagemaker/mxnet/README.rst Outdated

+. Save the model
+              Hyperparameters will be passed as command-line arguments to your training script.
+              In addition, the locations for finding input data and saving the model and output data will need to be defined.

Contributor

icywang86rui Sep 17, 2018

We pass these in the container with environment variables. I don't think users can override these in their training script. They can specify the s3 locations for these. But since we handle the data downloading and mounting the shared partition for the container they have to use the paths in the environment variables. I think we should explain this a little better and put a reference to the environment variables in sagemaker-containers. There is a list in the README.

Contributor Author

laurenyu Sep 17, 2018

we store the locations in environment variables, but we do rely on the user to read those environment variables in their script, so they do have to define variables with those locations. I can add explanation about the environment variables though.

Contributor

eslesar-aws Sep 17, 2018

Suggest:

In addition, you need to define the locations of where to get the input data and where to save the model artifacts and output data.

src/sagemaker/mxnet/README.rst Outdated

+                      parser.add_argument('--learning-rate', type=float, default=0.1)
+                      # input data and model directories
+                      parser.add_argument('--model-dir', type=str, default='opt/ml/model')

Contributor

icywang86rui Sep 17, 2018

We should use the environment variables to set the args here as well.

laurenyu requested review from eslesar-aws and icywang86rui

September 17, 2018 17:00


          add details about env variables

9695cd5

icywang86rui previously approved these changes

View reviewed changes

eslesar-aws reviewed

View reviewed changes

src/sagemaker/mxnet/README.rst Outdated

		@@ -599,13 +599,16 @@ The code executed from your main guard needs to:
		3. Save the model

		Hyperparameters will be passed as command-line arguments to your training script.

Contributor

eslesar-aws Sep 17, 2018

Suggest replacing "will be" with "are".

src/sagemaker/mxnet/README.rst Outdated

@@ @@ -599,13 +599,16 @@ The code executed from your main guard needs to: @@
 . Save the model
               Hyperparameters will be passed as command-line arguments to your training script.
-              In addition, the locations for finding input data and saving the model and output data will need to be defined.
+              In addition, the locations for finding input data and saving the model and output data will be provided as environment variables rather than as arguments to a function.

Contributor

eslesar-aws Sep 17, 2018

Suggestion:

In addition, you specify the locations of input data and where to save the model artifacts and output data as environment variables in the container, rather than as arguments to a function.

Contributor Author

laurenyu Sep 17, 2018

the container provides this info to the user, not the other way around. Would this be clearer?

In addition, the container will define the locations of input data and where to save the model artifacts and output data as environment variables rather than passing that information through train.

Contributor

eslesar-aws Sep 17, 2018

Minor change:

...rather than passing that information as arguments to the train function.

src/sagemaker/mxnet/README.rst

+. Initiate training
+. Save the model
+              Hyperparameters will be passed as command-line arguments to your training script.

Contributor

eslesar-aws Sep 17, 2018

Suggest replacing "will be" with "are"

Contributor Author

laurenyu Sep 17, 2018

I used future tense because these instructions are going to be live awhile before the changes themselves are released - I'm afraid present tense might be too confusing

Contributor

eslesar-aws Sep 17, 2018

Fair enough

eslesar-aws reviewed

View reviewed changes

src/sagemaker/mxnet/README.rst Outdated

+                      args, _ = parser.parse_known_args()
+              The code in the main guard should also take care of training and saving the model.
+              This can be as simple as just calling the methods used with the previous training script format:

Contributor

eslesar-aws Sep 17, 2018

Suggest:

This can be as simple as calling the train and save methods used in the previous training script format.

Contributor Author

laurenyu Sep 17, 2018

changed

laurenyu mentioned this pull request

Add warning about upcoming training script format change to training log aws/sagemaker-mxnet-training-toolkit#33

Merged

eslesar-aws previously approved these changes

View reviewed changes


          tweak wording

880a238

laurenyu dismissed eslesar-aws’s stale review via

880a238

September 18, 2018 00:23

laurenyu dismissed icywang86rui’s stale review via

880a238

September 18, 2018 00:23

icywang86rui approved these changes

View reviewed changes

laurenyu merged commit cb184cb into aws:master

laurenyu deleted the mxnet-script-mode-readme-warning branch

September 18, 2018 16:36

pdasamzn pushed a commit to pdasamzn/sagemaker-python-sdk that referenced this pull request


          Document upcoming MXNet training script format (aws#390)

039d6f3

The next major release of MXNet will change the
training script format. This README change documents the
changes needed by the user to adjust to the new format.
This is currently just a warning as the new format is not out yet.
The warning is meant to help users plan for the upcoming change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet